Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract In this study, we investigate the application of the New Physics Learning Machine (NPLM) algorithm as an alternative to the standard CWoLa method with Boosted Decision Trees (BDTs), particularly for scenarios with rare signal events. NPLM offers an end-to-end approach to anomaly detection and hypothesis testing by utilizing an in-sample evaluation of a binary classifier to estimate a log-density ratio, which can improve detection performance without prior assumptions on the signal model. We examine two approaches: (1) a end-to-end NPLM application in cases with reliable background modelling and (2) an NPLM-based classifier used for signal selection when accurate background modelling is unavailable, with subsequent performance enhancement through a hyper-test on multiple values of the selection threshold. Our findings show that NPLM-based methods outperform BDT-based approaches in detection performance, particularly in low signal injection scenarios, while significantly reducing epistemic variance due to hyperparameter choices. This work highlights the potential of NPLM for robust resonant anomaly detection in particle physics, setting a foundation for future methods that enhance sensitivity and consistency under signal variability.more » « lessFree, publicly-accessible full text available September 1, 2026
-
Free, publicly-accessible full text available November 1, 2026
-
Abstract We introduce SymbolFit (API: https://github.com/hftsoi/symbolfit), a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.more » « less
-
The increasing computational demand from growing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments has driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) approach. SONIC accelerates ML inference by offloading it to local or remote coprocessors to optimize resource utilization. Leveraging its portability to different types of coprocessors, SONIC enhances data processing and model deployment efficiency for cutting-edge research in high energy physics (HEP) and multi-messenger astrophysics (MMA). We developed the SuperSONIC project, a scalable server infrastructure for SONIC, enabling the deployment of computationally intensive tasks to Kubernetes clusters equipped with graphics processing units (GPUs). Using NVIDIA Triton Inference Server, SuperSONIC decouples client workflows from server infrastructure, standardizing communication, optimizing throughput, load balancing, and monitoring. SuperSONIC has been successfully deployed for the CMS and ATLAS experiments at the CERN Large Hadron Collider (LHC), the IceCube Neutrino Observatory (IceCube), and the Laser Interferometer Gravitational-Wave Observatory (LIGO) and tested on Kubernetes clusters at Purdue University, the National Research Platform (NRP), and the University of Chicago. SuperSONIC addresses the challenges of the Cloud-native era by providing a reusable, configurable framework that enhances the efficiency of accelerator-based inference deployment across diverse scientific domains and industries.more » « lessFree, publicly-accessible full text available July 18, 2026
-
Abstract Compact symbolic expressions have been shown to be more efficient than neural network (NN) models in terms of resource consumption and inference speed when implemented on custom hardware such as field-programmable gate arrays (FPGAs), while maintaining comparable accuracy (Tsoiet al2024EPJ Web Conf.29509036). These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming (GP), the search algorithm of most symbolic regression (SR) methods. Contrary to GP, the NN approach to SR offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose , a NN approach to SR specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing SR methods that struggle with datasets containing more than inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).more » « less
-
Abstract The promise of multi-messenger astronomy relies on the rapid detection of gravitational waves at very low latencies (O(1s)) in order to maximize the amount of time available for follow-up observations. In recent years, neural-networks have demonstrated robust non-linear modeling capabilities and millisecond-scale inference at a comparatively small computational footprint, making them an attractive family of algorithms in this context.However, integration of these algorithms into the gravitational-wave astrophysics research ecosystem has proven non-trivial.Here, we present the first fully machine learning-based pipeline for the detection of gravitational waves from compact binary coalescences (CBCs) running in low-latency. We demonstrate this pipeline to have a fraction of the latency of traditional matched filtering search pipelines while achieving state-of-the-art sensitivity to higher-mass stellar binary black holes.more » « less
-
A<sc>bstract</sc> In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.more » « less
An official website of the United States government
